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Abstract - Severe acute respiratory syndrome (SARS) is 
a serious form of pneumonia which results in acute 
respiratory distress and sometimes death. In this study, 
we applied the reverse vaccinology approach to 
determine the antigenic determinant sites present on the 
protein. The method incorporates the prediction of 
antigenic sites, solvent accessible region and secondary 
structure, B-cell epitope prediction, and the designing of 
antigenic determinant. The results of the study 
suggested that small envelope protein and orf8 protein 
could be potential candidates for vaccine designing. The 
high scoring antigenic peptides were designed and 
optimized. It is inferred that peptide of small envelope 
protein will make a very stable and effective vaccine 
targeting the E protein of the virus which is responsible 
for the spread of the virion. This study provides a 
strong and a potential optimized vaccine against SARS, 
which has high chances of success of immunization and 
higher probabilities of combating the dreadful disease. 
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I. Introduction 

SARS-CoV Tor2 shows a similarity to group 2 of 
coronaviruses. Its genome includes genes encoding 
two replicate polyproteins (RNA-dependant-RNA- 
polymerase, i.e., pp la and pp lab), encompassing 
two-thirds of the genome. In addition to the presence 
of replicase, it also contains, M (membrane) protein, 
S (spike) protein, E (envelop) protein, and N 
(nucleocapsid) protein. These proteins are conserved 
among all coronaviruses. Nine novel open-reading 
frames (ORF) i.e., ORF3, 4, 7, 8, 9, 10, 11, 13, and 
14 are known to be present in the genome of SARS- 
CoV also encodes. Identification of ORF through 
sequence similarity search revealed that there are 


several proteins in SARS virion that are responsible 
for the infection. 

These proteins include replicase la, replicases lb the 
matrix (M) protein, spike (S) protein the 
nucleocapsid (N) protein and the small envelope (E) 
protein. These coronaviruses are known to be 
enveloped and contain a positive strand RNA with a 
minimum of 4 structural proteins which are the M 
protein, the small E protein, the N protein and the S 
protein. These proteins are well known for their roles 
in virion budding and receptor binding. E protein is 
an envelope protein with 76 amino acids. It is an 
important membrane component of these 
viruses.Apart fromn its function it also has a role in 
corona virus virion life cycle. The major aim of 
this study was to develop an optimized peptide 
fragment containing neutralizing epitopes, using 
bioinformatics databases and tools, that could be 
used as a synthetic vaccine. For this purpose the 
“Tor2” strain of the virus was selected as it was 
the first and the most severe strain, which hit the 
Toronto city of Canada very badly. 



Fig 1. Life-cycle of SARS coronavirus. 

II. MATERIALS AND METHODS 

Reverse Vaccinology approach 

Reverse Vaccinology is an important and accepted 
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databases and sequences of pathogen genome and 
various bioinformatics software tools has enabled the 
search for vaccine candidates as in-silico process 
now. Unlike the conventional identification of 
antigen, reverse vaccinology uses whole genome 
spectrum of most potential antigens. It makes it 
possible to obtain individual and protein groups of 
vaccine candidates that includes the antigens, which 
otherwise can be missed out because of poor 
expression in lab conditions or because of problems 
with the culturing of pathogen. 

Protein sequence analysis 

The whole proteome of SARScoronavirus Tor2 was 
downloaded from the JCVI-CMR database (J Craig 
Venter Institute-Comprehensive Microbial Resource) 
( http ://cmr. i cvi.org/ c gibin/ CMR/shared/ Genomes .c gi 

).The functional sequences of the proteins in the 
FAST A format were obtained from NCBI (National 
Centre for 

Biotechnologylnformation f www.ncbi.nlm.nih.gov ). 

A total of 20 functional protein sequences were 
obtained including the structural and the non- 
structural proteins. After removing the hypothetical 
and putative sequences, the sequences were analyzed 
using the TFASTY tool (BLOSUM 45 ,62 and 80), to 
check for similarities with the Coding DNA 
Sequences(CDS) in the Homo sapiens . 

TFASTY is a local alignment tool used for 
comparing a probe protein sequence to a DNA 
sequence database , calculating similarities. It allows 
substitutions and frame shifts within a codon. 

The protein sequences which showed no similarity 
with the available library sequences were further 
analyzed using the BLAST tool (BLOSUM 45 ,62 
and 80), to check for any similarity with the 
proteome of H. sapiens. 

Protein BLAST tool, available at 
http://blast.ncbi.nlm.nih.gov/Blast.cgi , is a word 
based, local alignment algorithm, used to compare 
the input protein sequences with the protein sequence 
library. 

The protein sequences showing no significant 
similarity with the H. sapiens protein sequences were 
selected as vaccine candidates and were used for 
immunogenic analysis. 

Prediction of antigenicity 

The antigenic properties of the sequences were 
computed by the EMBOSS Antigenic .This tool helps 
in predicting the potentially antigenic regions of a 
protein sequence, using the semi-empirical method of 


Kolaskar and Tongaonkar [1] Analysis of data from 
experimentally determined antigenic sites has 

revealed that the hydrophobic residues Cys, Leu and 
Val, if occur on the surface of a protein, are more 
likely to be a part of antigenic sites. 

Antigenic peptides were also computed by the 
ANTIGENIC PEPTIDE PREDICTION 

( http://imed.med.ucm.es/Tools/antigenic.pl ) The 
peptides showing common results for both the 
programs are selected for further analysis and 
structure prediction. 

Finding the location in solvent accessible region 
NetSurfP prediction server was used to predict the 
solvent accessible regions on the protein i.e. the 
exposed residues on the surface. SAA was calculated. 

Parker Hydrophilicity prediction 

Based on the peptide retention times In HPLC a 
hydrophilic scale was designed. The HPLC was 
conducted on a reversed-phase column. To analyze 
the epitope region a window having seven residues 
was used.The corresponding value of the scale was 
introduced for each of the seven residues and the 
arithmetical mean of the seven residue value was 
assigned to the fourth, (i+3), residue in the 
segment. [2] 

Prediction of Protein Secondary Structure 

The secondary structure of the antigenic peptide can 
be predicted using the SOPMA. The improved Self 
Optimized Prediction Method (SOPMA) correctly 
predicts 69.5% of amino acids for a three-state 
description of the secondary structure (alpha-helix, 
beta-sheet and coil) in a whole database containing 
126 chains of non-homologous (less than 25% 
identity) proteins. Joint prediction with SOPMA and 
a neural networks method (PHD) correctly predicts 
82.2% of residues for 74% of co-predicted amino 
acids. 

Designing and optimization of the vaccine 
candidate 

The candidate vaccine was designed and optimized 
by using SYBYL software. 

III. RESULTS AND DISCUSSION 

Selection of the vaccine candidate 

TFASTY searches a nucleic acid database, translated 

in all six frames, using a protein query sequence. 
TFASTY calculates similarity scores. TFASTY also 
incorporates additional code for the alignment of 
three frame translations with protein sequences, 
allowing for the incorporation of frame-shifts in the 








database sequence translations. Because of the extra 
translation step, TFASTY searches can take an 
exceptionally long time. Alternatives to TFASTY are 
TFASTA and TFASTX. TFASTY for allows frame 
shifts within codons, while TFASTX does not. 
TFASTA does not allow for frame shifting. 

The variance of the scores is calculated and used 
along with the regression line to determine the 
normalized score, z-opt. 

Karlin and Altschul (1990, PNAS 87:2264) indicate 
that these normalized local similarity scores can be 
described by the extreme value distribution. 
Therefore, the probability of finding normalized 
scores greater than any observed z-opt , P(z > x), can 
be determined. From this, the expected number of 
sequences having a z-opt greater than any observed 
value, E(z >= x), can be calculated by P*D, where 
"D" is the number of sequences in the database. 

Out of a total of 20 proteins of the proteome of SARS 
CoV Tor2 strain, small envelope protein(orf5) 
showed no similarity with the known Homo sapiens 
CDS in TFASTY. Further in BLAST, the protein 
showed no significant similarity with the proteome of 
Homo sapiens. Hence the protein were selected as the 
potential vaccine candidate against SARS CoV Tor2 
strain. 

Prediction of Antigenic Peptides 

It is known that specific epitopes on a protein and not 
the whole protein are responsible for inducing 
immune response. The small envelope protein is 75 
residues long. Fig 2 shows the antigenic determinant 
plot; x-axis shows the residues number and y-axis 
shows average antigenic propensity The average 
antigenic propensity for this protein is 1.1191. There 
are one antigenic determinant in the sequence. The 
highest peak is at the position 10-61 residue and the 
sequence is 

“GTLIVNSVLLFLAFVVFLLVTLAILTALRLCAY 

CCNIVNVSLVKPTVYVYSR”. 


I Threshold =1.000 



Fig 2. PREDICTION OF ANTIGENICITY IN SMALL ENVELOPE PROTEIN 
BY USING KOLASKAR AND TONGAONKAR METHOD 

The average antigenic propensity is above 1.0; all 
residues above 1.0 is potentially antigenic. Highest 
peak at the plot indicate the antigenic site for 
attachment. 

Surface Accessibility Prediction 

Emini Sutface Accessibility Prediction 


m Threshold = 1.000 



Fig 3. SHOWING THE surface accessible region of the 
ANTIGENIC DETERMINANT. 

The sequence with seven amino acids, a hexa peptide 
whose Sn is greater than 1.0 indicates the probability 
of it being present on the surface [3] 


Antigenic 
site (si no.) 

Antigenic 

Score 

Max score 
position 

SAA(in %) 

01 

1.262 

26 

44.23 


Table 1 . table showing the antigenic score along with 

SURFACE ACCESSIBILITY (SAA) OF ANTIGENIC SITE. 

















Solvent accessible region 

Solvent accessible region was calculated by using 
NetsurfP server. 


Class 

assignmen 

t 

Amin 
o acid 

Residu 

e no 

RS 

A 

ASA 

E 

M 

1 

.850 

170.1 

8 

E 

Y 

2 

.377 

80.58 

E 

S 

3 

.537 

62.94 

E 

F 

4 

.296 

59.32 

E 

V 

5 

.365 

56.17 

E 

S 

6 

.323 

37.83 


Table 2. Table showing the exposed areaof amino acids. 


E=Exposed, RSA=Relative surface accessibility, 
ASA=Absolute surface accessibility 

B-cell Epitope prediction 

B-cell epitope was predicted within our antigenic site 
by using neural network method to increase the 
accuracy of the prediction. Higher score means the 
higher probability to be a epitope. All the peptides 
shown here are above the threshold value 0.51. 


Table 3. sequences having the higher 

THRESHOLD VALUE(0.51) FOR B-CELL PREDICTION 


RAN 

K 

SEQUENCE 

PO 

S 

SCOR 

E 

i 

TLAILTALRLC AY CC 

N 

21 

0.85 

2 

LC AY CCNIVN V SLVK 

P 

30 

0.71 


The region 30-36 in the antigenic site shows an 
overlapping B-cell epitope. So, there is a high chance 
for this region to be as epitope. 

B-cell epitope prediction is useful for the 
development of antibodies against SARS infection. 
Antibodies to SARS-Coronavirus ( SARS-CoV )- 
specific B cell epitopes might recognize the pathogen 
and interrupt its adherence to and penetration of host 


cells. Hence, these epitopes could be useful for 
diagnosis and as vaccine constituents. [4] Research 
revealed that human monoclonal antibodies have 
been used from memory B-cells to neutralize SARS 
Coronavirus. [5] 

Hydrophilicity prediction 

Parker Hydrophilic#y Prediction 


■ Threshold = -2.262 



Fig 4 CALCULATE THE HYDROPHILICITY OF ANTIGENIC 
DETERMINANT BY USING PARKER METHOD. 

Hydrophilic scale based on peptide retention times 
during high-performance liquid chromatography 
(HPLC) on a reversed-phase column was constructed. 
A window of seven residues was used for analyzing 
epitope region. For each of the seven residues 
corresponding scale value was introduced. The 
arithmetical mean value of these seven residues was 
assigned to the fourth, (i+3), residue in the 
segment. [6] 


Secondary structure prediction 



Fig 5. SHOWING THE PERCENTAGE OF HELIX, BETA BRIDGE, COIL 
REGION. 

The structural analysis of the protein revealed the 
presence of maximum percentage of alpha-helix 
which makes the protein hydrophilic in nature. 










































Designing and Energy Optimization 

The peptide was built by using SYBYL software. 
AMBER charge distribution was applied for building 
the peptide and found the molecule having the net 
charge 3.0. The initial energy of the molecule was 
838385.780kcal/mole. 

Energy minimization was done by applying AMBER 
force field, and energy was calculated by Powel 
method. After energy minimization the molecule was 
energetically minimized and showed energy of - 
485.873 kcal/mole. 


Bond Stretching Energy 

15.747 kcal/mole 

Angle Bending Energy 

54.628 kcal/mole 

Torsional Energy 

90.566 kcal/mole 

Improper Torsional Energy 

1.592 kcal/mole 

1-4 Van Der Waal Energy 

183.692 kcal/mole 

Van Der Waal Energy 

-170.360 kcal/mole 

1-4 Electrostatic Energy 

379.065 kcal/mole 

Electrostatic energy 

-1040.802 kcal/mole 

TOTAL ENERGY 

-485.873 kcal/mole 


Table 4. table showed the energy breakdown of the 

ENERGY MINIMIZED STRUCTURE OF THE CANDIDATE VACCINE. 




-Lao -L50 -L20 -90 -60 -30 Q 30 60 SO 120 150 100 

Phi (degrees} 

Fig7. showing the result of Ramachandran plot 


Ramachandran plot analysis 


PDB File : vaccinestr.pdb 

Total conformations : 50 


Residue 

L-Helix 

Sheet 

R-Helix 

FAR 

AAR 

GAR 

Outside 

Total 

ALA 

0 

3 

0 

3 

1 

0 

0 

4 

ARG 

0 

1 

0 

1 

0 

0 

0 

1 

ASH 

0 

3 

0 

3 

0 

0 

0 

3 

CYS 

0 

3 

0 

3 

0 

0 

0 

3 

ILE 

0 

3 

0 

3 

0 

0 

0 

3 

LEU 

0 

11 

0 

11 

0 

0 

0 

11 

LYS 

0 

1 

0 

1 

0 

0 

0 

1 
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0 

3 

0 

3 

0 

0 

0 

3 

PRO 

0 

0 

1 

1 

0 

0 

0 

1 

SER 

0 

3 

0 

3 

0 

0 

0 

3 

THR 

0 

3 

0 

3 

1 

0 

0 

4 

TYR 

0 

3 

0 

3 

0 

0 

0 

3 

VAL 

0 

10 

0 

10 

0 

0 

0 

10 

Total 

0 

47 

1 

48 

2 

0 

0 

50 


F ull y Allowed Region ( 48 residues) : 96.00 % 

Additionally Allowed Region ( 2 residues) : 4.00 % 




100.00 

Alpha helical Region ( 

1 residues) 

: 2.08 

Beta sheet Region ( 47 

residues) 

: 97.92 

3-10 helical Region ( 

0 residues) 

: 0.00 



100.00 


Fig 8. SHOWING THE ANALYSIS OF RAMACHANDRAN PLOT. 

The result showed that, 96% residues (48 residues) 
fallen under allowed region (FAR) and 4% ( 2 
residues) belong to additionally allowed 
region(AAR). No residues present in the disallowed 
region. 

IV. CONCLUSION 


Fig 6 DESIGNED PEPTIDE IN A PERIODIC BOUNDARY 
CONDITION.(USING SYBYL) 

Ramachandran plot 

The designed structure was validated by using 
Ramachandran plot. The result showed that all the 
residues fall under allowed region. 


Vaccine design is suited to the application of in-silico 
techniques, for the development of new and existing 
vaccines. The identification of effective vaccine 
depends on the identification of the composition of 
protective antigens. Proteins and polysaccharide used 
as protective antigens for various vaccine 
formulations.But, in case of polysaccharide, due to 
high level of molecular variations, made it difficult to 











































effectively sustain vaccine stability. Hence, in this 
study, major efforts were directed towards identifying 
and testing protein antigen. 

Through this work potential vaccine candidates were 
found through the screening of the proteome of 
SARS coronavirus TOR2 strain.. Small envelope 
protein play an important role in the replication of 
SARS tor2 strain. The protein is hydrophobic in 
nature and most of the amino acids were exposed on 
the surface. The potential antigenic sites were 
determined by using various immunological 
approach. B-cell epitopes were identified and 
showed that 30-36 region of the antigenic site might 
be an epitope. The predicted antigenic peptides 
having a score of above 1 are considered to be 
potentially antigenic and can elicit an immune 
response in the human body by acting as epitopes for 
the B cells. The candidate peptide having the 
minimized energy of -485.873 kcal/mole, showing 
satisfactory results. The chances that this peptide will 
induce efficient vaccination against SARS 
coronavirus are high because the small envelope 
protein, a structural and conserved protein in almost 
all SARS coronavirus strains, is directly involved in 
the replication and enveloping of the viral protein. 
Thus a vaccine targeting this protein will result in the 
blocking of the life cycle of the virus and prevent its 
spread as a pandemic. The envelope (E) protein from 
coronaviruses is a small polypeptide that contains at 
least one a-helical transmembrane domain. Absence, 
or inactivation, of E protein results in attenuated 
viruses, due to alterations in either virion morphology 
or tropism. Current work emphasise that E-protein 
plays a important and multifunctional role in 
coronavirus virion life cycle. Spike protein S 
involved in viral fusion with host cells, the envelope 
protein E, the membrane protein M and the 
nucleocapsid protein N that binds to the RNA 
genome as well as several additional open reading 
frames and involved in viral replication. 
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